-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix(vendor)!: direct extraction for registry sources #15514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Except this doesn't guarantee things are pristine until we vendor the contents of the |
The chance of this happening is low because This patch aims to be as simple as possible. The “direct extraction” is the correct fix but require more architectural changes. |
And that we delete the extracted So looking over this again, this does not fix #15090. This PR only changes which files we list, it doesn't change how we copy them which is what is causing the problem in #15090.
Is it all that big of an architectural change? Can we get an instance of the registry source and and call an extract method directly on it? |
How does it not fix #15090? cargo-vendor doesn't change any content of a registry package, at least
Maybe not really an architectural change, but we'll need to have an abstraction over local/remote/http registry (which is RegistryData/RegistrySource). Ensure |
c6773a8
to
a89f513
Compare
a89f513
to
b09d3de
Compare
Anyway, I got direct extraction working. Doesn't look too terrible than I thought! |
@rustbot note benchmark-result Also, the benchmark result shows that it is ~60% faster when vendoring cargo and its dependencies. On Apple M1 Pro with 500GiB APFS SSD
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I have not gone over this with a fine-toothed comb, but the overall structure looks good to me.
I'm not going to click merge right now in case anyone else wants to look closer, or you had any additional concerns or changes you were thinking of. If there isn't more progress by sometime next week, then I think it would be fine to just merge.
b2fbd65
to
0a5e0fc
Compare
0a5e0fc
to
c2d3bc4
Compare
This is early enough in the release that I'm going to go ahead and merge this now. |
/// Filters files we want to vendor. | ||
/// | ||
/// `relative` is a path relative to the package root. | ||
fn vendor_this(relative: &Path) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we filter out .cargo_vcs_info.json
and explicitly bring it back in?
Also, we're closing #11000 without having a test for that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it rearranged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was confusing. I approved your updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is supposed to be vendored when we do direct extraction.
`PathSource::list_files` has some heurstic rules for listing files. Those rules are mainly designed for `cargo package`. Previously, cargo-vendor relies on those rules to understand what files to vendor. However, it shouldn't use those rules because: * Package extracted from a `.crate` tarball isn't Git-controlled, some rules may apply differently. * The extracted package already went through `PathSource::list_files` during packaging. It should be clean enough. * Should keep crate sources from registry sources in a pristine state, which is exactly what vendoring is meant for. Instead, we switch to direct extraction into the vendor directory to ensure source code is the same as in the `.crate` tarball. There are some caveats: * The overwrite protection in `unpack_package` assumes the unpack directory is always `<pkg>-<version`>. We don't want to remove this, but for cargo-vendor supports vendoring without version suffix. For that case, we need a temporary staging area, and move the unpacked source then. * The heurstic in `PathSource::list_files` did something "good" in general cases, like excluding hidden directories. That means common directorys like `.github` or `.config` won't be vendored. After this, those get included. This is another round of churns. We might want to get other `cargo-vendor` changes along with this in one single release.
To make the behavior change outstanding for fixing rust-lang/rust-lang#11000 this is split into a separate commit.
By doing direct extractions, source code is kept in its original state The old workaround is irrelevant now.
Not sure if it is really needed, though Cargo had better follow what platform support says. > The behavior on Windows is the same on Windows 10 1607 and higher > if `FileRenameInfoEx` is supported by the filesystem; otherwise, > `from` can be anything, but `to` must *not* be a directory. https://doc.rust-lang.org/1.86.0/src/std/fs.rs.html#2430-2435
c2d3bc4
to
0290e40
Compare
Pull Request is not mergeable
### What does this PR try to resolve? This is meant to fixes #13191 As git sources and registry sources are considered immutable. I don't think there is any reason excluding those files. There might be a little chance local Git repositories might have those, though that is a rare use case. Alternatively, we could reject all `.rej`/`.orig` files but `Cargo.toml.orig`. ### How should we test and review this PR? Test updates should be sufficient. ### Additional information This is a follow-up of #15514
What does this PR try to resolve?
PathSource::list_files
has some heurstic rules for listing files. Those rules are mainly designed forcargo package
.Previously, cargo-vendor relies on those rules to understand what files to vendor. However, it shouldn't use those rules because:
.crate
tarball isn't Git-controlled, some rules may apply differently.PathSource::list_files
during packaging. It should be clean enough.Instead, we switch to direct extraction into the vendor directory
to ensure source code is the same as in the
.crate
tarball.How should we test and review this PR?
There is already a test
vendor::package_exclude
for the fix of #9054,though now I think it is not a good fix. The test change shows the correct behavior change.
I am not sure if we want more tests.
Also, there are some caveats with this fix:
unpack_package
assumes the unpackdirectory is always
<pkg>-<version
>.We don't want to remove this,
but for cargo-vendor supports vendoring without version suffix.
For that case, we need a temporary staging area,
and move the unpacked source then.
PathSource::list_files
did something "good" ingeneral cases, like excluding hidden directories. That means
common directories like
.github
or.config
won't be vendored.After this, those get included. This is another round of churns.
We might want to get other
cargo-vendor
changes along with thisin one single release.
Additional information
cargo vendor
vscargo package
inconsistency #9054cargo vendor
does not include license-file if not in package list #9575cargo vendor
doesn't generate.cargo_vcs_info.json
#11000cargo vendor
misses some files #14034cargo vendor
loses some nonempty directories #15080cargo vendor
should vendor crates in a pristine state #15090This also opens a door for
Cargo.toml
can break crates with deps. that readCargo.toml
#13191Since we are changing vendored source (again), we might want to remove the
.rej
/.orig
special rules in cargo-vendor, as well as look into the new source-dedup vendor dir layout.Summary Notes
Generated by triagebot, see help for how to add more